Front Matter

Back to top

Objective

R is a great data analysis and stats language and has lots of practical applications in many businesses.

Using it can result in a fantastic return on investment as R is free to use, but you’ll only get that ROI when you’re using it in a robust infrastructure and utilising sensible development practices.

This training day takes you through the basics of R development, showing you the best practices along the way, then shows you how to setup the Linux infrastructure needed to do team development and to deliver reporting across the company. ## About Me

itsalocke.com | github.com/stephlocke | @SteffLocke

R

Back to top

About R

R is an integrated suite of software facilities for data manipulation, calculation and graphical display

  • Open source
  • In-memory, single-core (by default), multi-platform
  • Extensible environment
  • Super-useful
  • http://www.r-project.org/

How it hangs together

Top 10 packages

  1. data.table
  2. ggplot2
  3. knitr
  4. shiny
  5. rmarkdown
  6. RODBC
  7. readxl
  8. stringr
  9. dplyr
  10. httr

I lied…

  1. foreach
  2. doMC
  3. magrittr
  4. xtable
  5. devtools
  6. testthat
  7. DiagrammeR
  8. git2r
  9. rvest
  10. zoo

Installation

Back to top

R

  • Download latest base R exe file from cran.rstudio.com
  • Also download Rtools latest exe
  • Install R then Rtools - if 64bit, install both 32 and 64 as it saves you hassle with other drivers

Rstudio

  • Go to rstudio.com
  • Select correct install file for your PC, and install

Git

Configuration

  • Open Rstudio
  • Go to Tools> global options and set rstudio to point at the location of your git.exe file Git exe location
  • Setup your details in git by going to git > shell in Rstudio or opening up command-prompt
    • Run git config --global user.name="Your name"
    • Run git config --global user.email="email@addre.ss"
  • Use Packages > Install to install devtools and installr
  • Install Rtraining by executing

    library(devtools)
    # Case-sensitive!
    install_github("stephlocke/Rtaining")

Syntax & structure

Back to top

Basic operators

Action Operator Example
Subtract - 5 - 4 = 1
Add + 5 + 4 = 9
Multiply * 5 * 4 = 20
Divide / 5 / 4 = 1.25
Raise to the power ^ 5 ^ 4 = 625
Modulus %% 10 %% 4 = 2
Absolute remainder %/% 9 %/% 4 = 2
Basic sequence : sum(1:3) = 6

Comparison operators

Action Operator Example
Less than < 5 < 5 = FALSE
Less than or equal to <= 5 <= 5 = TRUE
Greater than > 5 > 5 = FALSE
Greater than or equal to >= 5 >= 5 = TRUE
Exactly equal == (0.5 - 0.3) == (0.3 - 0.1) is FALSE, 2 == 2 is TRUE
Not equal != (0.5 - 0.3) != (0.3 - 0.1) is TRUE, 2 != 2 is FALSE
Equal all.equal() all.equal(0.5 - 0.3,0.3 - 0.1) is TRUE

States

States Representation
True TRUE 1
False FALSE 0
Empty NULL
Unknown NA
Not a number e.g. 0/0 NaN
Infinite e.g. 1/0 Inf

Logical operators

Action Operator Example
Not ! !TRUE is FALSE
And & TRUE & FALSE is FALSE, c(TRUE,TRUE) & c(FALSE,TRUE) is FALSE, TRUE
Or | TRUE | FALSE is TRUE, c(TRUE,FALSE) | c(FALSE,FALSE) is TRUE, FALSE
Xor xor() xor(TRUE,FALSE) is TRUE
Bitwise And && c(TRUE,TRUE) && c(FALSE,TRUE) is FALSE
Bitwise Or || c(TRUE,FALSE) || c(FALSE,FALSE) is TRUE
In %in% "Red" %in% c("Blue","Red") is TRUE
Not in !( x %in% y) or Hmisc::%nin% "Red" %nin% c("Blue","Red") = FALSE

Control constructs

Type Implementation Example
If if(condition) {dosomething} if(TRUE) { 2 } is 2
If else if(condition) {do something} else {do something different} or ifelse(condition, do something, do something else) if(FALSE) { 2 } else { 3 } is 3 ifelse(FALSE, 2, 3) is 3
For loop for(i in seq) {dosomething} or foreach::foreach(i=1:3) %do% {something} foreach(i=1:3) %do% {TRUE} is TRUE, TRUE, TRUE
While loop while(condition) {do something } a<-0 ; while(a<3){a<-a+1} ; a is 3
Switch switch(value, …) switch(2, "a", "b") is b
Case memisc::cases(…) cases("pi<3"=pi<3, "pi=3"=pi==3,"pi>3"=pi>3) is pi>3

NB: If you find yourself using a loop, there’s probably a better, faster solution

Assignment operators

Action Operator Example
Create / update a variable <- a <- 10

NB: There are others you could use, but this is the best practice

Accessors

Action Operator Example
Use public function from package :: memisc::cases()
Use private function from package ::: optiRum:::pounds_format()
Get a component e.g a data.frame column $ iris$Sepal.Length
Extract a property from a class @ Won’t be used in this course
Refer to positions in a data.frame or vector [ ] iris[5:10,1]
Refer to item in a list [[ ]] list(iris=iris,mtcars=mtcars)[["iris"]]

Meta-operators

Action Operator Example
Comment # # This is my comment
Help ? ?data.table
Identifier ` irisDT[ , `:=`(CreatedDate = Sys.Date())]

Data types

Back to top

Primitive data types

Data type Example
Integer 1
Logical TRUE
Numeric 1.1
String / character “Red”
Factor (enumerated string) “Amber” or 2 in c(“Red”,“Amber”,“Green”)
Complex i
Date “2015-04-24”

Compound data types

Data type Info Construction example(s)
Vector A 1D set of values of the same data type c(1,“a”) , 1:3 , LETTERS
Matrix A 2D set of values of the same data type matrix(LETTERS,nrow=13, ncol=2) , rbind(1:5,2:6)
Array An nD set of values of the same data type array(LETTERS, c(13,2))
Data.frame A 2D set of values of different data types data.frame(a=1:26, b=LETTERS)
List A collection of objects of various data types list(vector=c(1,“a”), df=data.frame(a=1:6))
Classes A class is like a formalised list and can also contain functions i.e. methods Won’t be covered in this class

NB: Most of my work uses vectors, data.tables (a souped up version of data.frames), and lists

Useful functions relating to data types

Function Use
is.[data type] Whether a vector is of a particular type
as.[data type] Attempts to coerce a vector to a data type
str Structure of an object including class/data type, dimensions
class The class(es)/data type(s) an object belongs to
summary Summarises an object
dput Get R code that recreates an object
unlist Simplify a list to a vector
dim Dimensions of a data type

IO

Back to top

Requirements

  • suggested package: data.table
  • suggested package: readr
  • suggested package: readxl

In

Format Functions
CSV read.csv , data.table::fread , readr::read_csv
Excel readxl::read_excel
Database RODBC::sqlQuery , DBI::dbGetQuery
SPSS / SAS / Stata haven::read_[prog]

Out

Format Functions
CSV write.csv
Excel openxlsx::write.xlsx
Database RODBC::sqlSave , DBI::dbWriteTable
SPSS / SAS / Stata foreign::write.foreign

Other file types

As well standard formats, there’s a lot of connector packages out there, including a suite for Hadoop.

Wrangling tables

Back to top

Requirements

  • necessary package: data.table (>=v1.9.5)
  • suggested package: optiRum

SQL

data.table is “SQL-like”

DT[i, j, by]
DT[WHERE | JOIN | ORDER, SELECT | UPDATE, GROUP]

data.table behaves like a database

A data.table acts like an in-memory RDBMS:

  • The result of a query is a table
  • Can set clustered primary keys over single or multiple columns
  • Can set non-clustered secondary keys over single or multiple columns
  • Can perform a wide range of relational algebra
  • Can do CRUD operations
  • Think in columns / sets not rows!

data.table differences to SQL Server

There are some differences that need to be mentioned:

  • A primary key can have repeat values
  • No constrained foreign keys
  • Joins are done by position in key so tblA with key A, and tblB with key C,A would join on tblA[A] and tblB[C]
  • It’s inherently dynamic so vectors of column names etc. can be provided to it
  • DELETE doesn’t exist explicitly
  • It’s case-sensitive

data.table cookbook

Single table basics

Task Generic syntax Example(s)*
CREATE data.table(…) setDT() data.table(a=1:3 , b=LETTERS[1:3]) data.table(iris)
PRIMARY KEY data.table(…,key) setkey() data.table(a=1:3 , b=LETTERS[1:3], key="b") setkey(data.table(iris),Species)
SELECT basic DT[ , .( cols )] irisDT[ , .(Species, Sepal.Length)]
SELECT alias DT[ , .( a=col )] irisDT[ , .(Species, Length=Sepal.Length)]
SELECT COUNT DT[ , .N] irisDT[ ,.N]
SELECT COUNT DISTINCT DT[ , uniqueN(cols)] irisDT[ ,uniqueN(.SD)]
SELECT aggregation DT[ , .( sum(col) , .N )] irisDT[ , .(Count=.N, Length=mean(Sepal.Length))]
WHERE exact on primary key DT[value] DT[value, ] irisDT["setosa"] irisDT["setosa", .(Count=.N)]
WHERE DT[condition] DT[condition, j, by] irisDT[Species=="setosa"] irisDT[Species=="setosa", .(Count=.N)]
WHERE BETWEEN DT[between(col, min, max)] DT[ col %between% c(min,max) ] irisDT[between(Sepal.Length, 1, 5)] irisDT[Sepal.Length %between% c(1,5)]
WHERE LIKE DT[like(col,pattern)] DT[ col %like% pattern ] irisDT[like(Species,"set")] irisDT[Species %like% "set"]
ORDER asc. DT[order(cols)] DT[order(cols), j, by] irisDT[order(Species)]
ORDER desc. DT[order(-cols)] DT[order(-cols), j, by] irisDT[order(-Species)]
ORDER multiple DT[order(cols)] DT[order(cols), j, by] irisDT[order(-Species, Petal.Width)]
GROUP BY single DT[i, j, by] irisDT[ ,.N, by=Species]
GROUP BY multiple DT[i, j, by] irisDT[ ,.N, by=.(Species,Width=Petal.Width)]
TOP head(DT, n) head(irisDT)
HAVING DT[i, j, by][condition] irisDT[ , .(Count=.N), by=Species][Count>25]
Sub-queries DT[…][…][…] irisDT[ , .(Sepal.Length=mean(Sepal.Length)), by=Species][Sepal.Length>6, .(Species)]

* Uses irisDT <- data.table(iris)

CRUD

Task Generic syntax Example(s)*
INSERT DT <- rbindlist(DT, newDT) irisDT<-rbindlist( irisDT, irisDT[1] )
READ aka SELECT (see above) DT[ , .( cols )] irisDT[ , .(Species, Sepal.Length)]
UPDATE / ADD column DT[ , a := b ] irisDT[ , Sepal.Area := Sepal.Width * Sepal.Length]
UPDATE / ADD multiple columns DT[ , `:=`(a = b, c = d) ] irisDT[ , `:=`(CreatedDate = Sys.Date(), User = "Steph")]
UPDATE / ADD multiple columns by reference DT[ , (newcols):=vals ] irisDT[ , c("a","b"):=.(1,2)]
DELETE DT <- DT[!condition] irisDT <- irisDT[!(Species=="setosa" & Petal.Length>=1.5)]
DROP table DT <- NULL irisDT<-NULL
DROP column DT[,col:=NULL] iristDT[,Species:=NULL]

* Uses irisDT <- data.table(iris)

Metadata

Task Generic syntax Example(s)*
Structure str(DT) str(irisDT)
Column Names colnames(DT) colnames(irisDT)
Summary stats summary(DT) summary(irisDT)
Retrieve primary key info key(DT) key(irisDT)
List all data.tables tables() tables()

* Uses irisDT <- data.table(iris)

Relationships

Task Generic syntax Example(s)*
INNER JOIN Y[X, nomatch=0] lookupDT[irisDT,nomatch=0]
LEFT JOIN Y[X] lookupDT[irisDT]
FULL JOIN merge(X, Y, all=TRUE) merge(irisDT, lookupDT, all=TRUE)
CROSS JOIN optiRum::CJ.dt(X,Y) CJ.dt(irisDT, lookupDT)
UNION ALL rbindlist( list(X,Y), fill=TRUE ) rbindlist( list(irisDT, lookupDT), fill=TRUE )
UNION unique( rbindlist( list(X,Y), fill=TRUE ) ) unique( rbindlist( list(irisDT, lookupDT), fill=TRUE ) )
JOIN and AGGREGATE Y[X, cols, by] lookupDT[irisDT,.(count=.N),by=Band]

* Uses:

irisDT   <- data.table(iris, key="Species")
lookupDT <- data.table(Species=c("setosa", "virginica", "Blah"), Band=c("A", "B", "A"), key="Species")

Intermediate tasks

Task Generic syntax Example(s)*
SELECT dynamically DT[ , colnames , with=FALSE] , DT[ , .SD , .SDcols=colnames cols<-colnames(irisDT); irisDT[ , cols, with=FALSE] cols<-colnames(irisDT); irisDT[ , .SD, .SDcols=colnames]
GROUP BY dynamically DT[ , …, by=colnames] irisDT[,.N,by=c("Species")]
GROUP BY, ORDER BY group DT[ , … , keyby] irisDT[,.N,keyby=Species]
UPDATE / ADD column of summary stat DT[ , a := b ] irisDT[ , All.SL.Mean:=mean(Sepal.Length)]
UPDATE / ADD column by group DT[ , a := b, by] irisDT[ , Species.SL.Mean:=mean(Sepal.Length), by=Species]
TOP by group DT[ , head(.SD), by] irisDT[ , head(.SD,2) , by=Species]
Largest record DT[ which.max(col) ] irisDT[ which.max(Sepal.Length) ]
Largest record by group DT[ , .SD[ which.max(col) ], by] irisDT[ , .SD[ which.max(Sepal.Length) ], by=Species]
Cumulative total DT[ , cumsum(col) ] irisDT[ , cumsum(Sepal.Width)]
NEGATIVE SELECT DT[ , .SD, .SDcols=-“colname”] irisDT[ , .SD, .SDcols=-"Species"]
RANK DT[ , frank(col) ] irisDT[ , frank(Sepal.Length,ties.method="first")]
AGGREGATE multiple columns DT[ , lapply(.SD, sum)] irisDT[ , lapply(.SD,sum), .SDcols=-"Species"]
AGGREGATE multiple columns by group DT[ , lapply(.SD, sum), by] irisDT[ , lapply(.SD,sum), by=Species]
COUNT DISTINCT multiple columns by group DT[ , lapply(.SD, uniqueN), by] irisDT[ , lapply(.SD,uniqueN), by=Species]
COUNT NULL multiple columns by group DT[ , lapply(.SD, function(x) sum(is.na(x))), by] irisDT[ , lapply(.SD,function(x) sum(is.na(x))), by=Species]
PIVOT data - to single value column melt(DT,…) melt(irisDT)
PIVOT data - to aggregate dcast(DT, a~b, function) dcast(melt(irisDT), Species ~ variable, sum)
Convert a large data.frame or list setDT() iris<-iris; setDT(iris)
ROW_NUMBER DT[ , .I] irisDT[ , .I]
GROUP number DT[, .GRP ,by] irisDT[ , .GRP, by=Species]

* Uses irisDT <- data.table(iris)

Advanced tasks

Task Generic syntax Example(s)*
GROUP BY each new incidence of group DT[ , cols , by=(col, rleid(col))] irisDT[order(Sepal.Length), .N, by=.(Species, rleid(Species))]
Calculate using (previous/next) N row DT[ , shift( cols, n)] irisDT[ , prev.Sepal.Length:=shift(Sepal.Length), by=Species ]
ORDER underlying table setorder() setorder(irisDT,Species)
JOIN & GROUP by keys X[Y, .N, by=.EACHI] irisDT[lookupDT, .N, by=.EACHI]
TRANSPOSE data.table transpose(DT) transpose(irisDT)
Split string to columns DT[, tstrsplit(charCol, pattern) ] irisDT[ , tstrsplit(as.character(Species),"e")]

* Uses:

irisDT   <- data.table(iris, key="Species")
lookupDT <- data.table(Species=c("setosa", "virginica", "Blah"), Band=c("A", "B", "A"), key="Species")

More resources

Making charts

Back to top

DISCLAIMER

This intro covers the charting package ggplot2.

The “base” charting functionality will not be covered because it’s much more difficult to achieve good looking results quickly and I don’t believe in that much effort for so little benefit!

Requirements

  • necessary package: ggplot2
  • suggested package: RColorBrewer
  • suggested package: ggthemes
  • suggested package: optiRum

ggplot2

ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.

Glossary

Term Explanation Example(s)
plot A plot using the grammar of graphics ggplot()
aesthetics attributes of the chart colour, x, y
mapping relating a column in your data to an aesthetic
statistical transformation a translation of the raw data into a refined summary stat_density()
geometry the display of aesthetics geom_line(), geom_bar()
scale the range of values axes, legends
coordinate system how geometries get laid out coord_flip()
facet a means of subsetting the chart facet_grid()
theme display properties theme_minimal()

Constructing a chart - a step by step process

  1. Create the base plot (doesn’t work on it’s own)
library(ggplot2)

p <- ggplot(data=iris)
  1. Add aesthetic mappings (doesn’t work on it’s own)
p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length, colour=Species))
  1. Add a geometry
p <- p + geom_point()
p

  1. (Optional) Add a statistic
p <- p + stat_boxplot(fill="transparent")
p
## Warning: position_dodge requires constant width: output may be incorrect
## Warning: position_dodge requires non-overlapping x intervals

  1. (Optional) Alter coordinate system
p <- p + coord_flip()
p
## Warning: position_dodge requires constant width: output may be incorrect
## Warning: position_dodge requires non-overlapping x intervals

  1. (Optional) Facet the chart
p <- p + facet_grid(.~Species)
p

  1. (Optional) Amend look and feel
p <- p + optiRum::theme_optimum()
p

Constructing a chart - a one-step process

ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length, colour=Species)) + 
  geom_point() +
  stat_boxplot(fill="transparent") +
  # coord_flip() + # Commented out
  facet_grid(.~Species) +
  optiRum::theme_optimum()

More resources

Making documents

Back to top

Requirements

  • necessary package: knitr
  • necessary package: rmarkdown
  • necessary software: pandoc
  • recommended software: MiKTeX

Using R for documents

Producing documents / documentation directly in R means that you closely interweave (knit) your analysis and R code together. This reduces rework time when you want to change or extend your code, it reduces time to produce new versions, and because it’s code it’s easier to apply strong software development principles to it.

Oh, and you don’t need to spend hours making text boxes in powerpoint! Win ;-)

There are two languages which you can knit your r code into:

  • markdown
  • LaTeX (pronounced lay-tech alas)

Markdown is great for very quick generation and light (or css driven) styling and is what this section focusses on. LaTeX is excellent for producing stunning, more flexible documents.

rmarkdown standard text

The following text is the default text that gets created when you produce a new rmarkdown file in rstudio

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

rmarkdown standard documentation

The following text is part of the standard documentation on rmarkdown. I pull it from github.com/rstudio/rmarkdown and integrate it using knitr. It is better than I could produce and the act of integrating it gives an extra example of useful ways to build documents.

This document provides quick references to the most commonly used R Markdown syntax. See the following articles for more in-depth treatment of all the capabilities of R Markdown:

Emphasis

*italic*   **bold**

_italic_   __bold__

Headers

# Header 1

## Header 2

### Header 3

Lists

Unordered List:

* Item 1
* Item 2
    + Item 2a
    + Item 2b

Ordered List:

1. Item 1
2. Item 2
3. Item 3
    + Item 3a
    + Item 3b

R Code Chunks

R code will be evaluated and printed

```{r}
summary(cars$dist)
summary(cars$speed)
```

Inline R Code

There were 50 cars studied

Images

Images on the web or local files in the same directory:

![alt text](http://example.com/logo.png)

![alt text](figures/img.png)

Blockquotes

A friend once said:

> It's always better to give
> than to receive.

Plain Code Blocks

Plain code blocks are displayed in a fixed-width font but not evaulated

```
This text is displayed verbatim / preformatted
```

Inline Code

We defined the `add` function to
compute the sum of two numbers.
LaTeX Equations

LaTeX Equations

Inline equation:

$equation$

Display equation:

$$ equation $$

Horizontal Rule / Page Break

Three or more asterisks or dashes:

******

------

Tables

First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content Cell
Reference Style Links and Images

Manual Line Breaks

End a line with two or more spaces:

Roses are red,
Violets are blue.

Miscellaneous

superscript^2^

~~strikethrough~~

Making reports

Back to top

What’s Shiny?

  • An interactive report framework
  • An R package that is available free (as is the server edition)
  • Uses “modern web standards” like bootstrap under the hood
  • Cross-browser/Cross-device compatibility, out of box!

Quick example

library(data.table)
library(shiny)
defaultdisplay<-list(
    width="100%", height="75%"
  )
shinyAppDir(
  system.file("examples/06_tabsets", package="shiny"),
  options = defaultdisplay
)

Shiny structure

Typical Contents

A shiny application report consists of two functions:

  • shinyServer()
  • shinyUI()

One says what to execute and the other states how to present it. Do all data manipulation, chart production in shinyServer()

“Lite” Contents

defaultdisplay<-list(width="100%", height="75%")

shinyApp(
    ui      = fluidPage(),
  , server  = function(input, output) {}
  , options = defaultdisplay
)

Files

You typically split into two files:

  • server.R containing shinyServer()
  • ui.R containing shinyUI()

This can then be run with runApp()

You can do a single file example app.R which contains both functions but this is typically better for very short apps.

Front-end layout

Use these just inside shinyUI() to produce a layout

##       Page Types
## 1:     basicPage
## 2: bootstrapPage
## 3:     fixedPage
## 4:     fluidPage
## 5:    navbarPage

Typical Inputs

Dates

shinyApp(
  ui = fluidPage(dateInput("datePicker", "Pick a date:", 
                           format="dd/mm/yy"),
                 dateRangeInput("dateRange", "Pick dates:", 
                                start=Sys.Date(), 
                                end=Sys.Date() ) ),
  server = function(input, output) {}
  ,options = defaultdisplay
)

Values

Basic

shinyApp(
  ui = fluidPage(numericInput("vals", "Insert a number:", 
                              value=15, min=10)  ),
  server = function(input, output) {}
  ,options = defaultdisplay
)

Sliders

shinyApp(
  ui = fluidPage(sliderInput("vals", "Insert a number:", 
                             min=0, max=50, value=15)  ),
  server = function(input, output) {}
  ,options = defaultdisplay
)

Text

A single line

shinyApp(
  ui = fluidPage(textInput("char", "Insert text:")  ),
  server = function(input, output) {}
  ,options = defaultdisplay
)

A paragraph

shinyApp(
  ui = fluidPage(tags$textarea(id="charbox", rows=3, 
                               cols=40, "Default value")  ),
  server = function(input, output) {}
  ,options = defaultdisplay
)

Selectors

shinyApp(
  ui = fluidPage(selectInput("multiselect", "Pick favourites:",
                             c("Green","Red","Blue"),
                             multiple=TRUE)  ),
  server = function(input, output) {}
  ,options = defaultdisplay
)

List of input types

##               Input controls
##  1:       checkboxGroupInput
##  2:            checkboxInput
##  3:                dateInput
##  4:           dateRangeInput
##  5:                fileInput
##  6:             numericInput
##  7:            passwordInput
##  8:     registerInputHandler
##  9:       removeInputHandler
## 10:              selectInput
## 11:           selectizeInput
## 12:              sliderInput
## 13:                textInput
## 14: updateCheckboxGroupInput
## 15:      updateCheckboxInput
## 16:          updateDateInput
## 17:     updateDateRangeInput
## 18:       updateNumericInput
## 19:        updateSelectInput
## 20:     updateSelectizeInput
## 21:        updateSliderInput
## 22:          updateTextInput
##               Input controls

Typical Outputs

Input values

shinyApp(
  ui = fluidPage(textInput("char", "Insert text:") ,
                 textOutput("text")  ),
  server = function(input, output) {
    output$text <- renderText(input$char)
  }  ,options = defaultdisplay
)

Basic tables

shinyApp(
  ui = fluidPage(tableOutput("basictable")  ),
  server = function(input, output) {
    output$basictable <- renderTable(head(iris,5))
  }  ,options = defaultdisplay
)

Interactive tables

shinyApp(
  ui = fluidPage(dataTableOutput("datatable")  ),
  server = function(input, output) {
    output$datatable <- renderDataTable(head(iris,5))
  }  ,options = defaultdisplay
)

Charts

shinyApp(
  ui = fluidPage(plotOutput("chart")  ),
  server = function(input, output) {
    output$chart <- renderPlot(pairs(iris))
  }  ,options = defaultdisplay
)

Reactivity

Simple reactivity

  • Make functions that process inputs only when they change
a <- reactive({input$a})
a
  • Use these reactive functions in downstream server items for DRY & to reduce processing effort

An Example

shinyApp(
  ui = fluidPage(textInput("char", "Insert text:") ,
                 textOutput("textA"),textOutput("textB") ),
  server = function(input, output) {
    char<-reactive({rep(input$char,5)})
    output$textA <- renderText(paste(char(),collapse="+"))
    output$textB <- renderText(paste(char(),collapse="-"))
  }
  ,options = defaultdisplay
)

Styling

shinythemes

  • Get a different look and feel with the package shinythemes
  • Uses a number of bootstrap based themes
  • Good-looking quickly, but of course not company branded
  • View themes at bootswatch.com

CSS

  • Shiny outputs html so you can write CSS that works with it
  • Full list of CSS items doesn’t exist, use F12 on chrome or check out selectorgadget via rvest
  • Simple stuff like body, h1 will all work

shinydashboards

Infrastructure

Ad-hoc shiny

  • Rstudio (easiest) or just run directly
  • Use shiny::runApp()
  • Great for “expert” use

Cloud

  • shinyApps.io
  • Deploy with shinyApps package
  • Free for light use
  • Extra management features at increased costs
  • Great for hands-off management

Central server

  • shiny-server
  • Runs on linux
  • Free community edition
  • Extra management features & LDAP auth in Pro Edition (but can roll your own)
  • Great for sensitive and/or db-driven appplications

LaTeX

Back to top

About LaTeX

Pronunciation: lay-tech

LaTeX is an open source markup language with a typesetting engine. It’s been around since the 70s and generally makes awesome documents.

LaTeX is designed to work stand-alone, or integrate with other languages. It’s particularly good with R.

It works in the way rmarkdown does with knitr, but allows for more sophisticated document styling.

Why LaTeX?

  • Seamless integration between code and content
  • Fantastic looking documentation
  • Works on all operating systems and has a number of cloud/internet hosted services
  • Can use markdown and append LaTeX style
  • Use for more polished presentations without having to go into PowerPoint

Get LaTeX

Once you have a LaTeX installation, you can write LaTeX in Rstudio.

In Rstudio, select the File type “R Sweave” which saves as a .Rnw file.

Learn LaTeX

LaTeX is a very deep language, so no attempt is made to teach you here.

One of the easiest ways of getting started with LaTeX (generally) is to pick one of the examples / templates on overleaf.com and play with it.

For using R and LaTeX, you can use the minimal examples on yihui.name/knitr to get started.

See the LaTeX wikibook for lots of info about LaTeX.

Source control

Back to top

version control all the things

SOURCE CONTROL IS FOR ALL THE THINGS

Fundamentals

Source control is important because it provides:

  • a backup system for your code
  • a means of tracking why you made changes
  • a way of sharing code
  • a system for multiple people to work safely
  • another potential way to deploy code manually
  • options for automated testing and deployment
  • audit trails

There are two types of source control systems:

  • centralised
  • distributed

Centralised means that there is a single storage location and to work on a file it must be exclusively checked out. Distributed systems involve taking copies of the code base, making changes, and pushing these back to primary storage location.

Both have their own disadvantages but since with distributed source control you never get the situation where someone’s left a file checked out as they go on holiday and no-one else can use it, I’m a big fan of distributed source control systems.

Git

Git is a distributed source control system.

It integrates neatly into Rstudio, making it easy to source control your analysis.

The git conceptual model (@cthydng)

git model

Git glossary

  • Repository: Your project and it’s history
  • Remote: The (TFS/GitHub/VSO etc) server
  • Clone:Take a linked copy of a repository
  • Fork: Take an unlinked copy of a repository
  • Pull: Update your copy of a repository
  • Add: Include a file in a change
  • Commit: Make the latest state of all the added files the new default state
  • Push: Update the central copy of the repository with your commits

There are more terms. For a friendly glossary see Github’s git glossary, and for an extensive, technical glossary see the official Git glossary

git2r

The package git2r supports a source control workflow directly within R. This means you can continue to use Rstudio for even complex git tasks. And of course there’s always the shell option in Rstudio.

For a handy Git cheatsheet, check out this GitHub one.

The git2r documentation is pretty good. It’s easier though to use once you’ve been utilising the Rstudio GUI for a bit, and dabbling with the command line.

Source control - tfsR

Back to top

## Loading required package: devtools
## Loading required package: tfsR

TFS

In TFS2013 and Visual Studio Online, you can use git repositories inside TFS. This gives the ability to leverage all the advantages of a distributed source control system whilst still giving you all the functionality of TFS.

If you’re using SourceSafe or earlier editions of TFS, UPGRADE for everyone’s sakes.

If you’re not using any source control, consider getting Visual Studio Online as it’s free for 5 users (plus unlimited MSDN users), requires no hosting or maintenance on your part, and has fine grained permissions management.

Working with your TFS site in R with tfsR

If you have/want to work with R using git repositories in TFS (either on-premises or via Visual Studio Online), tfsR saves you having to have Visual Studio (installed on your machine or online), and allows you to directly create git repositories within TFS.

tfsR Information

Setup

You must have a username (often an email address or AD account) and password for connecting. That’s basically it!

Intended use

Since you need to provide credentials to these functions, it is anticipated that you would use these on an ad-hoc basis in much the same way you would use devtools.

It’s a great way to quickly setup a repository if you need to get your stuff into source control.

In terms of actually using your repositories once they’re created, I recommend git2r.

Limitations

  • The active components use a Team Project with Git selected as the engine for it
  • Due to the lack of an API call for creating new projects, it is anticipated that you use this to create repositories within a “parent” or container project.
  • If you need seperate backlogs etc, you’ll have to create new Projects in the GUI
  • This only works currently with basic authentication, OAuth2.0 may follow
  • getTFSProjects will only handle a single TFS URL at a time (httr restriction)
  • createTFSRepository will only handle a single repository at a time, to maintain consistency

Using tfsR

Get info on existing repositories

A great place to start is a) verifying you can connect to your TFS and b) see what projects you could create repositories under

library(tfsR)

tfs          <-"https://stefflocke.visualstudio.com"
authcreds    <-httr::authenticate("tfsexample","UsedForExampl3s")
repositories <-getTFSProjects(tfs,authcreds)

knitr::kable(repositories)
id name url remoteUrl project.id project.name project.url project.state type
ef220f61-3654-4f00-a946-9faf3eb76c52 GitRepoContainer https://stefflocke.visualstudio.com/DefaultCollection/_apis/git/repositories/ef220f61-3654-4f00-a946-9faf3eb76c52 https://stefflocke.visualstudio.com/DefaultCollection/_git/GitRepoContainer f8c183a2-cbd1-43bb-a124-279dc31f7fe5 GitRepoContainer https://stefflocke.visualstudio.com/DefaultCollection/_apis/projects/f8c183a2-cbd1-43bb-a124-279dc31f7fe5 wellFormed Project
5e15f060-a20e-49cc-925f-ecab5975b5a8 rhv04g https://stefflocke.visualstudio.com/DefaultCollection/_apis/git/repositories/5e15f060-a20e-49cc-925f-ecab5975b5a8 https://stefflocke.visualstudio.com/DefaultCollection/GitRepoContainer/_git/rhv04g f8c183a2-cbd1-43bb-a124-279dc31f7fe5 GitRepoContainer https://stefflocke.visualstudio.com/DefaultCollection/_apis/projects/f8c183a2-cbd1-43bb-a124-279dc31f7fe5 wellFormed Sub-Project
d72180e6-bcf1-4061-a7eb-fc2a1722b852 APItest https://stefflocke.visualstudio.com/DefaultCollection/_apis/git/repositories/d72180e6-bcf1-4061-a7eb-fc2a1722b852 https://stefflocke.visualstudio.com/DefaultCollection/GitRepoContainer/_git/APItest f8c183a2-cbd1-43bb-a124-279dc31f7fe5 GitRepoContainer https://stefflocke.visualstudio.com/DefaultCollection/_apis/projects/f8c183a2-cbd1-43bb-a124-279dc31f7fe5 wellFormed Sub-Project

Create a new repository

TFS doesn’t allow you to create new projects so if you want to create a new repository, you have to do it in an existing project or go to the GUI and manually create one (there is no API call for this).

You don’t want to start nesting repositories so you need to make the repository under a top-level project. As such, in this bit of functionality, you provide the name of the top-level project which you want to put your repository, and it goes away and gets a GUID if the project exists and is a top-level project, then tells the API to create a repository in it.

tfs          <-"https://stefflocke.visualstudio.com"
authcreds    <-httr::authenticate("tfsexample","UsedForExampl3s")
parentproj   <-"GitRepoContainer"
newrepo      <- as.character(random::randomStrings(n=1, len=6))
createdrepo  <- createTFSRepository(tfs,authcreds,parentproj,newrepo)

This will provide you with the URL you need to be able to push and pull commits to. In this case it has created a repository called “ZZ5IEP” which can be interacted with at the URL https://stefflocke.visualstudio.com/DefaultCollection/GitRepoContainer/_git/ZZ5IEP

Package development

Back to top

Requirements

  • necessary package: devtools
  • suggested package: testthat

What’s a package?

A package is a collection of functionality designed to achieve one or more purposes. Commonly it is a bundle of functions that help tackle a certain type of analysis.

Packages are great ways to modularise your code and create standardised ways of doing specific tasks in your organisation, like charts (optiRum::theme_optimum()).

The package development Bible

There is an R foundation guide to writing packages. I don’t recommend you start with that! It is however what any package that you submit to the central repository of R packages (CRAN) will be held against - so if you’d like to get a package on CRAN you will need to read this.

The better, more accessible book R packages is by Hadley Wickham and will cover things in a lot of depth but is more accesable and has exercises.

For quick learning abotu devtools you can check out the cheatsheet

How do you build a package?

The easiest way to build a good quality package is to use the package devtools. This is a package designed specifically to make life easier for package developers.

Here is my typical workflow:

library(devtools)
pkg<-"newPackage"
create(pkg)

# Open the project!

library(devtools)

# Add unit test framework
add_test_infrastructure()

# Add CI framework
add_travis()

# Add folder for macro-level help files
use_vignette()

# Add file for providing info about your package
use_package_doc()

# Add a file for storing comments about the release if submitting to CRAN
use_cran_comments()

# Create various useful files
file.create("README.md")
file.create("NEWS")

# Set git up
library(git2r)
init(".")

Once I have this skeleton I fill in the various bits of info about my package in DESCRIPTION, README, R/package.R, and so forth.

After I’ve done some basic hygiene, I can start building my R functions and associated tests.

Writing quality functions

  1. Plan it out
  2. Write the documentation first
  3. Keep testing foremost in your mind - ideally, write unit tests first
  4. Choose sensible defaults for paramaters
  5. (Without strong reason) make your function return an object that has to be specifically assigned to the global environment
  6. Add validation of inputs and error handling
  7. Avoid loops
  8. Make dependencies obvious
  9. Consider how you will be able to test the function
  10. Use the ellipsis (...) argument to pass values through to optional components in functions
#' A function quick description
#' 
#' A more detailed description that can span multiple lines
#' for readability. Covers concepts, typical usages etc.
#'
#' @param  param1 Info about param1 e.g. data type, guidance
#' @param  param2 Info about param1 e.g. data type, guidance
#' @param  ...    Additional values to pass to x, y, z
#'
#' @return returnDT Info about what is returned by the function
#' 
#' @keywords words allowing search
#' @family ifPartOfABundle
#' 
#' @examples
#' # Sample code that illustrates usage
#'
#' @export

myFuncName<- function(param1, param2="Blah", ...){
stopifnot(param1>0, is.character(param2))

# Function code routinely commented with WHY or
# explanations of complex HOW (but consider 
# breaking these up / simplifying)
}

Verification process

Let’s make sure your code is all working (assumes you’ve got unit tests)

library(devtools)

# Build help files
document()

# Run unit tests
test()

# Check against CRAN standards
check()

Each of these steps could identify things to fix. It’s great to get rid of as many ERRORs, WARNINGs, and NOTEs as possible.

Publishing your package

There are a number of locations but I’ll cover two:

  1. CRAN: The central repository of packages, this is ensures a minimum level of quality. It does however have some weird rules (like Title Case demands) and one or two of the gate-keepers aren’t very patient with people. It’s great to do, but be prepared for rework and some comments that could make you wince.
  2. GitHub: GitHub is becoming more and more indexed and well utilised asa location for R packages.

My personal recommendations are to use GitHub as your active development environment, so that people can download the latest version, and periodically attempt to release to CRAN. This helps push up your package quality and makes your code more widely available.

Testing

Back to top

Requirements

  • necessary package: testthat

What’s unit testing?

Unit testing is the technique of writing tests that assess low-level functionality against requirements.

Why should I unit test - I do charts and databases?

In reports, databases, and code we typically encode business rules, conventions, and our own conventions. Bringing these into reusable functions means less code reproduction, no variances between team members, and lower time to change.

All of these are verifiably correct and so can be tested. Therefore, why test them manually, or why risk someone “tweaking” the code and changing the rules without people knowing? Your unit tests save you time over the long run, and protect against unexpected behaviour changes.

Reconciliations can also be done inside your unit tests. If you have “canonical answers”, you can test new transformations against these to ensure you’re consistent.

How to unit test?

I’m not a developer so this may be the wrong approach but here are the scenarios I write tests for:

  1. A single set of values that represent “normal” and expect this matches a correct answer
  2. A dataset that represents “normal” and expecting this to match correct answers
  3. Various permutations of bad input values and expect errors (ideally specific error messages)
  4. Edge cases that cover extreme values or boundaries for any inequalities or conditions
  5. Any bugs or compatibility issues

Also, if you’re struggling to test because it’s got really complex inputs, outputs, or intermediate calculations, consider breaking up the code and rewriting. It’s more likely that things are going to go wrong and you won’t know if something is particularly complex.

How to unit test in R?

Let’s build a sample function:

myfunc<-function(a=1,b=2,c="blah"){
stopifnot(is.numeric(a), is.numeric(b),
            is.character(c))
d<-if(a<0){
          a*b
          }else{
          a/b
          }

e<-paste0(c,d)

return(e)
}

Let’s write some tests (in a file tests/testthat/test-myfunc.r)

# Add a high-level name for group of tests, typically the function name
context("myfunc")

# Simplest test
test_that("Defaults return expected result",{
  result<-myfunc()
  check<-"blah0.5"
  expect_equal(result,check)
})

# Vector test
test_that("Basic vectorisation works",{
  result<-myfunc(a=c(1,1),b=c(2,2), c=c("blah","blah"))
  check<-c("blah0.5","blah0.5")
  expect_equal(result,check)
})

# Non-uniform vectorisation test
test_that("Complex vectorisation works",{
  result<-myfunc(a=c(1,1),b=c(2,2))
  check<-c("blah0.5","blah0.5")
  expect_equal(result,check)
})

# Test a different condition
test_that("Negative a values result in multiplication",{
  result<-myfunc(a=c(-1,-2),b=c(2,2))
  check<-c("blah-2","blah-4")
  expect_equal(result,check)
})

# Test a different condition
test_that("a=0 values result in 0",{
  result<-myfunc(a=0)
  check<-c("blah0")
  expect_equal(result,check)
})

# Test some duff inputs
test_that("errors expectedly",{
  expect_error(myfunc(a="0"))
  expect_error(myfunc(b="0"))
  expect_error(myfunc(c=0))
})

There are a lot more expectation functions you can use and you can make your own.

Continuous integration with Travis-CI

Back to top

With excellent guidance and tooling on making R packages, it’s becoming really easy to make a package to hold your R functionality. This has a host of benefits, not least source control (via GitHub) and unit testing (via the testthat package). Once you have a package and unit tests, a great way of making sure that as you change things you don’t break them is to perform Continuous integration.

What this means is that every time you make a change, your package is built and thoroughly checked for any issues. If issues are found the “build’s broke” and you have to fix it ASAP.

The easiest, cheapest, and fastest way of setting up continuous integration for R stuff is to use Travis-CI, which is free if you use GitHub as a remote server for your code. NB - it doesn’t have to be your only remote server

Account setup

The first thing that needs doing is setting up your accounts and turning on CI for your repositories. The website is pretty good so I won’t go into a lot of detail, but the process is:

  1. sign up for a Travis-CI account
  2. link it to your GitHub account
  3. say which repositories you want to do CI on
  4. add config to your repositories

Additionally, whilst we’re doing this we should be awesome and set up test coverage checks as well. The process is really similar, but for coveralls.io and we only need the one set of config details in our package.

The config file

Then you add a really simple file into the root of your project called .travis.yml.

This should contain, at minimum, the following:

language: r
sudo: required

r_github_packages:
 - jimhester/covr

after_success:
  - Rscript -e 'library(covr);coveralls()'

NB - be careful with the indentation, YAML is very sensitive!

This is the latest set of values that work as it takes into account the recent support for R, the ability to reference github packages, and also Travis’ move towards docker containers which don’t accept sudo commands.

Once you’ve flipped the switch on Travis and Coveralls, every push to GitHub will trigger Travis. Travis will basically build a server with all the requirements needed to run R and build R packages. It’ll then install all your package’s dependencies, check the package for minimum quality standards and also run your testthat tests. Once this is done the final bit tests your code coverage and passes the results to Coverall.

Badge of honour

Great, so you’ve checked the sites and it’s working but you should show the world it’s working! You can get some some snippets of code from each of the sites that you can paste into your README file. These stay up to date with the latest results so that you (and everyone else) can see the status of your package.

Installing Rstudio

Back to top

Server creation

Azure portal, using gallery creation for VM quickcreate quickcreate

Configuring the VM

  1. Get PuTTY
  2. Connect to your VM via the public IP quickcreate
  3. Use the login details in the creation wizard. The password won’t look like you’re typing
  4. Run sudo apt-get update to get the package repository metadata
  5. Run sudo apt-get install r-base to get R. Will have lots of extra associated packages - select Y when prompted
  6. Follow the installation instructions, using the latest file
    • Run sudo apt-get install gdebi-core to enable processing of rstudio installation package
    • Run sudo apt-get install libapparmor1 if using ubuntu
    • Run (latest version of) wget http://download2.rstudio.org/rstudio-server-0.98.1103-amd64.deb
    • Run (latest version of) sudo gdebi rstudio-server-0.98.1103-amd64.deb ## Configuring port (away from 8787) and allowing on Azure
  7. Change rstudio to run on port 80 by amending port in conf file sudo nano /etc/rstudio/rserver.conf
  8. Restart rstudio to apply port change sudo rstudio-server restart
  9. Add port 80 to Azure endpoints for the VM port specify

Installing shiny-server

Back to top

Server creation

Azure portal, using gallery creation for VM quickcreate quickcreate

Configuring the VM

  1. Get PuTTY
  2. Connect to your VM via the public IP quickcreate
  3. Use the login details in the creation wizard. The password won’t look like you’re typing
  4. Run sudo apt-get update to get the package repository metadata
  5. Run sudo apt-get install r-base to get R. Will have lots of extra associated packages - select Y when prompted
  6. Follow the installation instructions, using the latest file
    • Run sudo su - -c "R -e \"install.packages('shiny', repos='http://cran.rstudio.com/')\"" to install shiny in R
    • Run sudo apt-get install gdebi-core to enable processing of shiny-server installation package
    • Run (latest version of) wget http://download3.rstudio.org/ubuntu-12.04/x86_64/shiny-server-1.3.0.403-amd64.deb
    • Run (latest version of) sudo gdebi shiny-server-1.3.0.403-amd64.deb

Configuring port (away from 3838) and allowing on Azure

  1. Change shiny-server to run on port 80 by amending port in conf file sudo nano /etc/shiny-server/shiny-server.conf
  2. Restart shiny-server to apply port change sudo restart shiny-server
  3. Add port 80 to Azure endpoints for the VM port specify shiny-server first view